System 2 as working-memory augmented System 1 reasoning

kaj_sotala

System 2 as working-memory augmented System 1 reasoning

post by Kaj_Sotala · 2019-09-25T08:39:08.011Z · LW · GW · 23 comments

  What type 1/type 2 processing is not
  Type 1/Type 2 processing as working memory use
  Type 2 processing as composed of Type 1 components
  Looking through Kahneman’s examples
  Consciousness and dual process theory
  Type 1/Type 2 and bias
  Summary and connection to the multiagent models of mind sequence
  References
None
23 comments

The terms System 1 and System 2 were originally coined by the psychologist Keith Stanovich and then popularized by Daniel Kahneman in his book Thinking, Fast and Slow. Stanovich noted that a number of fields within psychology had been developing various kinds of theories distinguishing between fast/intuitive on the one hand and slow/deliberative thinking on the other. Often these fields were not aware of each other. The S1/S2 model was offered as a general version of these specific theories, highlighting features of the two modes of thought that tended to appear in all the theories.

Since then, academics have continued to discuss the models. Among other developments, Stanovich and other authors have discontinued the use of the System 1/System 2 terminology as misleading, choosing to instead talk about Type 1 and Type 2 processing. In this post, I will build on some of that discussion to argue that Type 2 processing is a particular way of chaining together the outputs of various subagents using working memory. Some of the processes involved in this chaining are themselves implemented by particular kinds of subagents.

This post has three purposes:

Summarize some of the discussion about the dual process model that has taken place in recent years; in particular, the move to abandon the System 1/System 2 terminology.
Connect the framework of thought that I have been developing in my multi-agent minds sequence with dual-process models.
Push back on some popular interpretations of S1/S2 theory which I have been seeing on LW and other places, such as ones in which the two systems are viewed as entirely distinct, S1 is viewed as biased and S2 as logical, and ones in which it makes sense to identify more as one system or the other.

Let’s start with looking at some criticism of the S1/S2 model endorsed by the person who coined the terms.

What type 1/type 2 processing is not

The terms “System 1 and System 2” suggest just that: two distinct, clearly defined systems with their own distinctive properties and modes of operation. However, there’s no single “System 1”: rather, a wide variety of different processes and systems are lumped together under this term. It is also unclear whether there is any single System 2, either. As a result, a number of researchers including Stanovich himself have switched to talking about “Type 1” and “Type 2” processing instead (Evans, 2012; Evans & Stanovich, 2013; Pennycook, Neys, Evans, Stanovich, & Thompson, 2018).

What exactly defines Type 1 and Type 2 processing?

A variety of attributes have been commonly attributed to either Type 1 or Type 2 processing. However, one criticism is that there is no empirical or theoretical support for such attributes to only occur with one type of processing. For instance, Melnikoff & Bargh (2018) note that one set of characteristics which has been attributed to Type 1 processing is “efficient, unintentional, uncontrollable, and unconscious”, whereas Type 2 processing has been said to be “inefficient, intentional, controllable and conscious”.

(Before you read on, you might want to take a moment to consider the extent to which this characterization matches your intuition of Type 1 and Type 2 processing. If it does match to some degree, you can try to think of examples which are well-characterized by these types, as well as examples which are not.)

They note that this correlation has never been empirically examined, and that there are also various processes in which attributes from both sets co-occur. For example:

Unconscious (T1) and Intentional (T2). A skilled typist can write sentences without needing to consciously monitor their typing, “but will never start plucking away at their keys without intending to type something in the first place.” Many other skills also remain intentional activities even as one gets enough practice to be able to carry them out without conscious control: driving and playing piano are some examples. Also, speaking involves plenty of unconscious processes, as we normally have very little awareness of the various language-production rules that go into our speech. Yet we generally only speak when we intend to.
Unconscious (T1) and Inefficient (T2). Unconscious learning can be less efficient than conscious learning. For example, some tasks can be learned quickly using a verbal rule which describes the solution, or slowly using implicit learning so that we figure out how to do the task but cannot give an explicit rule for it.
Uncontrollable (T1) and Intentional (T2). Consider the bat-and-ball problem: "A bat and a ball cost $1.10 in total. The bat costs $1 more than the ball. How much does the ball cost?" Unless they have heard the problem before, people nearly always generate an initial (incorrect) answer of 10 cents. This initial response is uncontrollable: no experimental manipulation has been found that would cause people to produce any other initial answer, such as 8 cents to 13 cents. At the same time, the process which causes this initial answer to be produced is intentional: "it is not initiated directly by an external stimulus (the question itself), but by an internal goal (to answer the question, a goal activated by the experimental task instructions). In other words, reading or hearing the bat-and-ball problem does not elicit the 10 cents output unless one intends to solve the problem."

Regarding the last example, Melnikoff & Bargh note:

Ironically, this mixture of intentionality and uncontrollability characterizes many of the biases documented in Tversky and Kahneman’s classic research program, which is frequently used to justify the classic dual-process typology. Take, for example, the availability heuristic, which involves estimating frequency by the ease with which information comes to mind. In the classic demonstration, individuals estimate that more words begin with the letter K than have K in the third position (despite the fact that the reverse is true) because examples of the former more easily come to mind [107]. This bias is difficult to control – we can hardly resist concluding that more letters start with K than have K in the third position – but again, all of the available evidence suggests that it only occurs in the presence of an intention to make a judgment. The process of generating examples of the two kinds of words is not activated directly by an external stimulus, but by an internal intention to estimate the relative frequencies of the words. Likewise for many judgments and decisions.

They also give examples of what they consider uncontrollable (T1) but inefficient (T2), unintentional (T1) but inefficient (T2), as well as unintentional (T1) but controllable (T2). Further, they discuss each of the four attributes themselves and point out that they all contain various subdimensions. For example, people whose decisions are influenced by unconscious primes [LW · GW] are conscious of their decision but not of the influence from the prime, meaning that the process has both conscious and unconscious aspects.

Type 1/Type 2 processing as working memory use

Rather than following the “list of necessary attributes” definition, Evans & Stanovich (2013) distinguish between defining features and typical correlates. In previous papers, Evans has generally defined Type 2 processing in terms of requiring working memory resources and being able to think hypothetically. On the other hand, Stanovich has focused on what he calls cognitive decoupling, which his work shows is highly correlated with fluid intelligence as the defining feature.

Cognitive decoupling can be defined [? · GW] as the ability to create copies of our mental representations of things, so that the copies can be used in simulations without affecting the original representations. For example, if I see an apple in a tree, my mind has a representation of the apple. If I then imagine various strategies of getting the apple - such as throwing a stone at the tree to knock the apple down - I can mentally simulate what would happen to the apple as a result of my actions. But even as I imagine the apple falling down from the tree, I never end up thinking that I can get the real apple down simply by an act of imagination. This because the mental object representing the real apple is decoupled from the apple in my hypothetical scenario. I can manipulate the apple in the hypothetical without those manipulations being passed on to the mental object representing the original apple.

In their joint paper, Evans & Stanovich propose to combine their models and define Type 2 processes as those which use working memory resources (closely connected with fluid intelligence) in order to carry out hypothetical reasoning and cognitive decoupling. In contrast, Type 1 reasoning is anything which does not do that. Various features of thought - such as being automatic and the other controlled - may tend to correlate more with one or the other type, but these are only correlates, not necessary features.

Type 2 processing as composed of Type 1 components

In previous posts of my multi-agent minds sequence, I have been building up a model of mind that is composed of interacting components. How does it fit together with the proposed Type 1/Type 2 model?

Kahneman in Thinking Fast and Slow mentions that giving the answer to 2 + 2 = ? is a System (Type) 1 task, whereas calculating 17 * 24 is a System (Type) 2 task. This might be starting to sound familiar. In my post on subagents and neural Turing machines [LW · GW], I discussed Stanislas Dehane’s model where you do complex arithmetic by breaking up a calculation into subcomponents which can be done automatically, and then routing the intermediate results through working memory. You could consider this to also involve cognitive decoupling: for instance, if part of how you calculate 17 * 24 is by first noting that you can calculate 10 * 24, you need to keep the original representation of 17 * 24 intact in order to figure out what other steps you need to take.

To me, the calculation of 10 * 24 = 240 happens mostly automatically; like 2 + 2 = 4, it feels like a Type 1 operation rather than a Type 2 one. But what this implies, then, is that we carry out Type 2 arithmetic by chaining together Type 1 operations through Type 2 working memory.

I do not think that this is just a special case relating to arithmetic. Rather it seems like an implication of the Evans & Stanovich definition which they do not mention explicitly, but which is nonetheless relatively straightforward to draw: that Type 2 reasoning is largely built up of Type 1 components.

Under this interpretation, there are some components which are specifically dedicated to Type 2 processes: things like working memory storages and systems for manipulating their contents. But those components cannot do anything alone. The original input to be stored in working memory originates from Type 1 processes (and the act of copying it to working memory decouples it from the original process which produced it), and working memory alone could not do anything without those Type 1 inputs.

Likewise, there may be something like a component which is Type 2 in nature, in that it holds rules for how the contents of working memory should be transformed in different situations - but many of those transformations happen by firing various Type 1 processes which then operate on the contents of the memory. Thus, the rules are about choosing which Type 1 process to trigger, and could again do little without those processes. (My post on neural Turing machines [LW · GW] explicitly discussed such rules.)

Looking through Kahneman’s examples

At this point, you might reasonably suspect that arithmetic reasoning is an example that I cherry-picked to support my argument. To avoid this impression, I’ll take the first ten examples of System 2 operations that Kahneman lists in the first chapter of Thinking, Fast and Slow and suggest how they could be broken down into Type 1 and Type 2 components.

Kahneman defines System 2 in a slightly different way than we have defined Type 2 operations - he talks about System 2 operations requiring attention - but as attention and working memory are closely related, this still remains compatible with our model. Most of these examples involve somehow focusing attention, and manipulating attention can be understood as manipulating the contents of working memory to ensure that a particular mental object remains in working memory. Modifying the contents of working memory was an important type of production rule discussed in my earlier post.

Starting with the first example in Kahneman’s list:

Brace for the starter gun in a race.

One tries to keep their body in such a position that it will be ready to run when the gun sounds; recognizing the feel of the correct position is a Type 1 operation. Type 2 rules are operating to focus attention on the output of the system which outputs proprioceptive data, allowing Type 1 processes to notice mismatches with the required body position and correct them. Additionally, Type 2 rules are focusing attention on the sound of the gun, so as to more quickly identify the sound when the gun fires (a Type 1 operation), causing the person to start running (also a Type 1 operation).

Focus attention on the clowns in the circus.

This involves Type 2 rules which focus attention on a particular sensory output, as well as keeping one’s eyes physically oriented towards the clowns. This requires detecting when one’s attention/eyes are on something else than the clowns and then applying an internal (in the case of attention) or external (in the case of eye position) correction. As Kahneman offers “orient to the source of a sudden sound”, “detect hostility in a voice”, “read words on large billboards”, and “understand simple sentences” as Type 1 operations, we can probably say that recognizing something as a clown or not-clown and moving one’s gaze accordingly are Type 1 operations.

Focus on the voice of a particular person in a crowded and noisy room.

As above, Type 2 rules check whether attention is on the voice of that person (a comparison implemented using a Type 1 process), and then adjust focus accordingly.

Look for a woman with white hair.

Similar to the clown example.

Search memory to identify a surprising sound.

It’s unclear to me exactly what is going on here. But introspectively, this seems to involve something like keeping the sound in attention so as to feed it to memory processes, and then applying the rule of “whenever the memory system returns results, compare them against the sound and adjust the search based on how relevant they seem”. The comparison feels like it is done by something like a Type 1 process.

Maintain a faster walking speed than is natural to you.

Monitor the appropriateness of your behavior in a social situation.

Walking: Similar to the “brace for the starter gun” example, Type 2 rules keep calling for a comparison of your current walking speed with the desired one (a Type 1 operation), passing any corrections resulting from that comparison to the Type 1 system controlling your walking speed.

Social behavior: maintain attention on a conscious representation of what you are doing, checking it against various Type 1 processes which contain rules about appropriate and inappropriate behavior. Adjust or block accordingly.

Count the occurrences of the letter a in a page of text.

Focus attention on the letters of a text; when a Type 1 comparison detects the letter “a”, increment a working memory counter by one.

Tell someone your phone number.

After a retrieval of the phone number from memory has been initiated, Type 2 rules use Type 1 processes to monitor that it is said in full.

Park in a narrow space (for most people except garage attendants).

Keeping attention focused on what you are doing to allow a series of evaluations, mental simulations, and cached (Type 1) procedural operations determining how to act in response to a particular situation in the parking process.

A general pattern in these examples is that Type 2 processing can maintain attention on something as well as hold the intention to invoke comparisons to use as the basis for behavioral adjustments. As comparisons involve Type 1 processes, Type 2 processing is fundamentally reliant on Type 1 processing to be able to do anything.

Consciousness and dual process theory

Alert readers might have noticed that focusing one’s attention on something involves keeping it in consciousness, whereas the previous Evans & Stanovich definition noted that consciousness is not a defining part of the Type 1/Type 2 classification. Is this a contradiction? Probably not, since as remarked previously, different aspects of the same process may be conscious and unconscious at the same time.

For example, if one intends to say something, one may be conscious of the intention while the actual speech production happens unconsciously; once they say it and they hear their own words, an evaluation process can run unconsciously but output its results into consciousness. With “conscious” being so multidimensional, it doesn’t seem like a good defining characteristic to use, even if some aspects of it did very strongly correlate with Type 2 processing.

Evans (2012) writes in a manner which seems to me compatible with the notion of there being many different kinds of Type 2 processing, with different processing resources being combined according to different rules as the situation warrants:

The evidence suggests that there is not even a single type 2 system for reasoning, as different reasoning tasks recruit a wide variety of brain regions, according to the exact demands of the task [...].

I think of type 2 systems as ad hoc committees that are put together to deal with a particular problem and then disbanded when the task is completed. Reasoning with abstract and belief-laden syllogisms, for example, recruits different resources, as the neural imaging data indicate: Only the latter involve semantic processing regions of the brain. It is also a fallacy to think of “System 2” as a conscious mind that is choosing its own applications. The ad hoc committee must be put together by some rapid and preconscious process—any feeling that “we” are willing and choosing the course of our thoughts and actions is an illusion [...]. I therefore also take issue with dual-process theorists [...] who assign to System 2 not only the capacity for rule-based reasoning but also an overall executive role that allows it to decide whether to intervene upon or overrule a System 1 intuition. In fact, recent evidence suggests that while people’s brains detect conflict in dual-process paradigms, the conscious person does not.

If you read my neural Turing machines post, you may recall that I noted that the rules which choose what becomes conscious operate below the level of conscious awareness. We may have the subjective experience of being able to choose what thoughts we think, but this is a post-hoc interpretation rather than a fact about the process.

Type 1/Type 2 and bias

People sometimes refer to Type 1 reasoning as biased, and to Type 2 reasoning as unbiased. But as this discussion should suggest, there is nothing that makes one of the two types intrinsically more or less biased than the other. The bias-correction power of Type 2 processing emerges from the fact that if Type 1 operations are known to be erroneous and a rule-based procedure for correcting them exists, a Type 2 operation can be learned which implements that rule.

For example, someone familiar with the substitution principle [LW · GW] may know that their initial answer to a question like “how popular will the president be six months from now?” comes from a Type 1 process which actually answered the question of “how popular is the president right now?”.

They may then have a Type 2 rule saying something like “when you notice that the question you were asked is subject to substitution effects, replace the initial answer with one derived from a particular procedure”. But this still requires a) a Type 1 process recognizing the situation as one where the rule should be applied b) knowing a procedure which provides a better answer c) the cue-procedure rule having been installed previously, itself a process requiring a number of Type 1 evaluations (about e.g. how rewarding it would be to have such a rule in place).

There is nothing to say that somebody couldn’t learn an outright wrong Type 2 rule, such as “whenever you think of 2+2 = 4, substitute your initial answer of ‘4’ with a ‘5’”.

Often, it is also unclear of what the better Type 2 rule even should be. For instance, another common substitution effect is that when someone is asked “How happy are you with your life these days?”, they actually answer the question of “What is my mood right now?”. But what is the objectively correct procedure for evaluating your current happiness with life?

On the topic of Type 1/2 and bias, I give the final word to Evans (2012):

One of the most important fallacies to have arisen in dual-process research is the belief that the normativity of an answer [...] is diagnostic of the type of processing. Given the history of the dual-process theory of reasoning, one can easily see how this came about. In earlier writing, heuristic or type 1 processes were always the “bad guys,” responsible for cognitive biases [...]. In belief bias research, authors often talked about the conflict between “logic” and “belief,” which are actually dual sources, rather than dual processes. Evans and Over [...] defined “rationality2” as a form of well-justified and explicit rule-based reasoning that could only be achieved by type 2 processes. Stanovich [...] in his earlier reviews of his psychometric research program emphasized the association between high cognitive ability, type 2 processing and normative responding. Similarly, Kahneman and Frederick [...] associate the heuristics of Tversky and Kahneman with System 1 and successful reasoning to achieve normatively correct solutions to the intervention of System 2.

The problem is that a normative system is an externally imposed, philosophical criterion that can have no direct role in the psychological definition of a type 2 process. [...] if type 2 processes are those that manipulate explicit representations through working memory, why should such reasoning necessarily be normatively correct? People may apply the wrong rules or make errors in their application. And why should type 1 processes that operate automatically and without reflection necessarily be wrong? In fact, there is much evidence that expert decision making can often be well served by intuitive rather than reflective thinking [...] and that sometimes explicit efforts to reason can result in worse performance [...].

Reasoning research somewhat loads the dice in favor of type 2 processing by focusing on abstract, novel problems presented to participants without relevant expertise. If a sports fan with much experience of following games is asked to predict results, he or she may be able to do so quite well without need for reflective reasoning. However, a participant in a reasoning experiment is generally asked to do novel things, like assuming some dubious propositions to be true and deciding whether a conclusion necessarily follows from them. In these circumstances, explicit type 2 reasoning is usually necessary for correct solution, but certainly not sufficient. Arguably, however, when prior experience provides appropriate pragmatic cues, even an intractable problem like the Wason selection task becomes easy to solve [...], as this can be done with type 1 processes [...]. It is when normative performance requires the deliberate suppression of unhelpful pragmatic cues that higher ability participants perform better under strict deductive reasoning instructions [...].

Hence, [the fallacy that type 1 processes are responsible for cognitive biases and type 2 processes for normatively correct reasoning] is with us for some fairly precise historical reasons. In the traditional paradigms, researchers presented participants with hard, novel problems for which they lacked experience (students of logic being traditionally excluded), and also with cues that prompted type 1 processes to compete or conflict with these correct answers. So in these paradigms, it does seem that type 2 processing is at least necessary to solve the problems, and that type 1 processes are often responsible for cognitive biases. But this perspective is far too narrow, as has recently been recognized. In recent writing, I have attributed responsibility for a range of cognitive biases roughly equally between type 1 and type 2 processing [...]. Stanovich [...] similarly identifies a number of reasons for error other than a failure to intervene with type 2 reasoning; for example, people may reason in a quick and sloppy (but type 2) manner or lack the necessary “mindware” for successful reasoning.

Summary and connection to the multiagent models of mind sequence

In this post, I have summarized some recent-ish academic discussion on dual-process models of thought, or what used to be called System 1 and System 2. I noted that the popular conception of them as two entirely distinct systems with very different properties is mistaken. While there is a defining difference between them - namely, the use of working memory resources to support hypothetical thinking and cognitive decoupling - they seem to rather refer to differences in two types of thought, either of which may use very different kinds of systems.

It is worth noting at this point that there are many different dual-process models in different parts of psychology. The Evans & Stanovich model which I have been discussing here is intended as a generalized model of them, but as they themselves (2013) write:

… we defend our view that the Type 1 and 2 distinction is supported by a wide range of converging evidence. However, we emphasize that not all dual-process theories are the same, and we will not act as universal apologists on each one’s behalf. Even within our specialized domain of reasoning and decision making, there are important distinctions between accounts. S. A. Sloman [...], for example, proposed an architecture that has a parallel-competitive form. That is, Sloman’s theories and others of similar structure [...] assume that Type 1 and 2 processing proceed in parallel, each having their say with conflict resolved if necessary. In contrast, our own theories [...] are default-interventionist in structure [...]. Default-interventionist theories assume that fast Type 1 processing generates intuitive default responses on which subsequent reflective Type 2 processing may or may not intervene.

In previous posts of the multi-agent models of mind sequence [? · GW], I have been building up a model of the mind being built up of a variety of subsystems (which might in some contexts be called subagents).

In my discussion of Consciousness and the Brain [LW · GW], I summarized some of its conclusions as saying that:

The brain has multiple subagents doing different things; many of the subagents do unconscious processing of information. When a mental object becomes conscious, many subagents will synchronize their processing around analyzing and manipulating that mental object.
The collective of subagents can only have their joint attention focused on one mental object at a time.
The brain can be compared to a production system, with a large number of subagents carrying out various tasks when they see the kinds of mental objects that they care about. E.g. when doing mental arithmetic, applying the right sequence of mental operations for achieving the main goal.

In Building up to an Internal Family Systems model [LW · GW], I used this foundation to discuss the IFS model of how various subagents manipulate consciousness in order to achieve various kinds of behavior. In Subagents, neural Turing machines, thought selection, and blindspots [LW · GW], I talked about the mechanistic underpinnings of this model and how processes like thought selection and firing of production rules might actually be implemented.

What had been lacking so far was a connection between these models and the Type 1/Type 2 typology. However, if we take something like the Evans & Stanovich model of Type 1/Type 2 processing to be true, then it turns out that our discussion has been connected with their model all along. Already in “Consciousness and the Brain”, I mentioned the “neural Turing machine” passing on results from one subsystem to another through working memory. That, it turns out, is the defining characteristic of Type 2 processing - with Type 1 processing simply being any process which does not do that.

Under this model, then, Type 2 processing is a particular way of chaining together the outputs of various Type 1 subagents using working memory. Some of the processes involved in this chaining are themselves implemented by particular kinds of subagents.

References

Evans, J. S. B. T. (2012). Dual process theories of deductive reasoning: facts and fallacies. The Oxford Handbook of Thinking and Reasoning, 115–133.

Evans, J. S. B. T., & Stanovich, K. E. (2013). Dual-Process Theories of Higher Cognition: Advancing the Debate. Perspectives on Psychological Science: A Journal of the Association for Psychological Science, 8(3), 223–241.

Melnikoff, D. E., & Bargh, J. A. (2018). The Mythical Number Two. Trends in Cognitive Sciences, 22(4), 280–293.

Pennycook, G., Neys, W. D., Evans, J. S. B. T., Stanovich, K. E., & Thompson, V. A. (2018). The Mythical Dual-Process Typology. Trends in Cognitive Sciences, 22(8), 667–668.

23 comments

Comments sorted by top scores.

comment by MalcolmOcean (malcolmocean) · 2019-10-02T22:07:23.053Z · LW(p) · GW(p)

Mmmm I'm glad you've written this up. It feels to me like the first half of what needs to be done with the concepts of System 1 and System 2, which is dissolving the original meanings (from Kahneman etc).

The second half is... inasmuch as these concepts have become popular, what experiences are people using them to point at? It seems to me that there are actual multiple things happening here. Among rationalists, "System 2" is used to refer to thinking that is verbal, explicit, logical, procedural, linear. "System 1" is kind of a catch-all for... most of the rest of thinking.

There's a much more obvious mapping for this in the brain, but...

...it's unfashionable to talk about.

It's the left and right hemispheres. The left hemisphere indeed operates in a manner that is verbal, explicit, logical, procedural, linear. The right hemisphere operates in a more intuitive, parallel, context-sensitive, uniqueness-oriented way. (I'm summarizing here, but this is based on extensive reading of an excellent meta-analysis of decades of research, that notes that the classic notion of left/right as reason/emotion isn't accurate but that there's something really important going on.)

If you squint, you can kind of imagine the Right hemisphere as Type 1 and the Left hemisphere as Type 2 Processing, but it's importantly different. Neither hemisphere is necessarily automatic or faster or more abstract than the other. Both are, in different senses, conscious, although it seems possible that many people primarily identify with their left hemisphere consciousness.

To illustrate this difference: many of the examples in this post describe categorization ("count the letter a", "this voice, not that voice", etc) as being a Type 1 activity, but categorization is primarily a feature of the Left hemisphere's world. "How many people are in this room right now?" requires this categorization in order to answer. People are reduced to fungible, countable objects; if one exited and another entered, the total doesn't change. This is useful! "What's the mood of this room?" is much harder to answer with this sort of categorization applied. It's a much better question for a Right hemisphere, which can look at the whole scene and capture it in a metaphor or a few adjectives.

So when a rationalist is saying "My S1 has this intuition, but I can't explain it to my System 2", then one thing they might mean is that their Right hemisphere understands something, but is unable to articulate it in a way that their Left hemisphere could understand verbally & linearly and in terms of the concepts and categories that are familiar to it.

The other thing they might mean, however, is not a left-right distinction but a top-bottom or front-back distinction. This is what would be meant in the context of "My System 2 knows not all dogs are scary, but my System 1 still afraid of them since I got bitten when I was 7." This is much better modelled not with S2/S1, but by talking about the neocortex as opposed to the limbic system.

The process that instantly computes 2+2=4 is very different from the process that releases a bunch of adrenaline at the sound of a bark. Are they both Type 1? Seems so, but I don't know this model well enough.

My impression is that the left neocortex (which many rationalists identify as their "System 2") tends to categorize everything else as well, everything else. And this gets labelled "System 1". It makes sense to me that the left neocortex would treat everything else as a big blob, because it first & foremost makes a categorical distinction between its coherent articulable worldview (which it trusts at a level before awareness) and, well, everything else.

I've been playing with this model for a few months, and so far have found a couple clear distinctions between "right hemisphere" and "emotional brain / limbic system / amygdala etc".

One is about simplicity & complexity:

the right (neocortex?) is context-sensitive & aware of complex factors ("that person seems upset... oh, they had a date last night and they were nervous about it, I wonder how it went")
the emotional brain is simple and atemporal ("I feel scared because you spoke loudly like my father did when drunk" (that's what it's thinking, although it can't necessarily actually put any of that into words))

Another is that a right hemisphere intuition tends to know that it can't articulate itself and have a sort of relaxed confidence about that, whereas if something is stimulating emotion then you can easily generate all sorts of explanations (which may be accurate or confabulations) for why your "intuition" is correct.

I've found the recent reading I've done on the brain hemispheres, of Iain McGilchrist's work, to be one of the most important books I've read in my life. His model of hemispheres has profound implications in relation to everything from rationalization & motivated reasoning, to social dynamics & coordination, to technological development & science, to AI Safety & corrigibility.

I think if everyone here had it as a common framework, we'd be able to talk more fruitfully about many of the major tensions that have shown up on LessWrong on the past few years regarding post-rationality/meta-rationality, kenshō/Looking/etc, and more.

Links to learn more:

McGilchrist's book, The Master and his Emissary (This is the meta-analysis I was referring to. The first chapter alone has like 500 citations.)
an excellent 44 minute podcast introduction
a meta-thread of tweetstorms of mine on the subject

(I may edit this into a top-level at some point; for now I'm going to leave it as a comment)

Replies from: Kaj_Sotala

↑ comment by Kaj_Sotala · 2019-10-03T07:29:23.384Z · LW(p) · GW(p)

I've seen mentions of McGilchrist's work pop up every now and then, but I'm still unclear on what exactly his model adds. In particular, I'm a little cautious of the illusion of understanding that may pop up whenever people just add neuroscience terms to an explanation which wouldn't really need them.

E.g. "seeing a dog makes you feel afraid" versus "seeing a dog causes your amygdala to sound an alarm, making you feel afraid". The second sentence basically just names a particular part of the brain which is involved in the fear response. This isn't really a piece of knowledge that most people would do anything with (and it should hopefully have been obvious without saying that some part of the brain is involved in the response), but it can still feel like it had substantially more information.

A lot of the summaries of McGilchrist that I've seen so far raise similar alarms for me. You suggest that if everyone had his model as a common framework, then talking about various things would become more productive. But as far as I can tell, your description of his work just associates various mental processes with different parts of the brain. What's the benefit of saying "the verbal and explicit mode of thought, which is associated with the left hemisphere", as opposed to just "the verbal and explicit mode of thought"?

Replies from: malcolmocean, Richard_Kennaway, malcolmocean

↑ comment by MalcolmOcean (malcolmocean) · 2019-10-04T06:09:59.923Z · LW(p) · GW(p)

Here's one example of a benefit: the left hemisphere is known to have major blindspots, that aren't implied simply by saying "the verbal and explicit mode of thought." Quoting McGilchrist (not sure about page number, I'm looking at Location 5400 in the 2nd edition on Kindle) describing some tests done by temporarily deactivating one hemisphere then the other, in healthy individuals:

Take the following example of a syllogism with a false premise:

Major premise: all monkeys climb trees;

Minor premise: the porcupine is a monkey;

Implied conclusion: the porcupine climbs trees.

Well — does it? As Deglin and Kinsbourne demonstrated, each hemisphere has its own way of approaching this question. At the outset of their experiment, when the intact individual is asked "Does the porcupine climb trees?", she replies (using, of course, both hemispheres): "It does not climb, the porcupine runs on the ground; it's prickly, it's not a monkey." [...] During experimental temporary hemisphere inactivations, the left hemisphere _of the very same individual (with the right hemisphere inactivated) replies that the conclusion is true: "the porcupine climbs trees since it is a monkey." When the experimenter asks, "But is the porcupine a monkey?", she replies that she knows it is not. When the syllogism is presented again, however, she is a little nonplussed, but replies in the affirmative, since "That's what is written on the card." When the right hemisphere of the same individual (with the left hemisphere inactivated) is asked if the syllogism is true, she replies: "How can it climb trees — it's not a monkey, it's wrong here!" If the experimenter points out that the conclusion must follow from the premises stated, she replies indignantly: "But the porcupine is not a monkey!"

In repeated situations, in subject after subject, when syllogisms with false premises, such as "All trees sink in water; balsa is a tree; balsa wood sinks in water," or "Northern lights are often seen in Africa; Uganda is in Africa; Northern lights are seen in Uganda", are presented, the same pattern emerges. When asked if the conclusion is true, the intact individual displays a common sense reaction: "I agree it seems to suggest so, but I know in fact it's wrong." The right hemisphere dismisses the false premises and deductions as absurd. But the left hemisphere sticks to the false conclusion, replying calmly to the effect that "that's what it says here."

In the left-hemisphere situation, it prioritizes the system, regardless of experience: it stays within the system of signs. Truth, for it, is coherence, because for it there is no world beyond, no Other, nothing outside the mind, to correspond with. "That's what it says here." So it corresponds with itself: in other words, it coheres. The right hemisphere prioritises what it learns from experience: the real state of existing things "out there". For the right hemisphere, truth is not mere coherence, but correspondence with something other than itself. Truth, for it, is understood in the sense of being "true" to something, faithfulness to whatever it is that exists apart from ourselves.

However, it would be wrong to deduce from this that the right hemisphere just goes with what is familiar, adopting a comfortable conformity with experience to date. After all, one's experience to date might be untrue to reality: then paying attention to logic would be an important way of moving away from from false customary assumption. And I have emphasized that it is the right hemisphere that helps us to get beyond the inauthentically familiar. The design of the above experiment specifically tests what happens when one is forced to choose between two paths to the truth in answering a question: using what one knows from experience or following a syllogism where the premises are blatantly false. The question was not whether the syllogism was structurally correct, but what actually was true. But in a different situation, where one is asked the different question "Is this syllogism structurally correct?", even when the conclusion flies in the face of one's experience, it is the right hemisphere which gets the answer correct, and the left hemisphere which is distracted by the familiarity of what it already thinks it knows, and gets the answer wrong. The common thread here is the role of the right hemisphere as "bullshit detector". In the first case (answering the question "What is true here?") detecting the bullshit involves using common sense. In the second case (answering "Is the logic here correct?"), detecting the bullshit involves resisting the obvious, the usual train of thought.

For me personally, working with McGilchrist's model has dramatically improved my own internal bullshit-detection capacity. I've started to be able to sometimes smell the scent of rationalizations, even while the thoughts I'm having continue to feel true. This has been helpful for learning, for noticing when I'm being a jerk in relationships, and for noticing how I'm closing myself off to some line of thinking while debugging my code.

And the bullshit detection thing is just one element of it. The book relates dozens of other case studies on differences in way-of-being-and-perceiving of each hemisphere, and connects them with some core theories about the role of attention in cognition.

If you were surprised in reading this comment to discover that it's not the left hemisphere that is best at syllogisms, then I would like to suggest there are important things that are known about the brain that you could learn by reading this book, that would help you think more effectively. (This also applies if you were not-particularly-surprised because your implicit prior was simply "hemispheres are irrelevant"; I was more in this camp.)

Replies from: Kaj_Sotala, Lanrian

↑ comment by Kaj_Sotala · 2019-10-04T08:52:25.784Z · LW(p) · GW(p)

Thanks! That does indeed sound valuable. Updated towards wanting to read that book.

↑ comment by Lukas Finnveden (Lanrian) · 2019-10-04T10:47:35.781Z · LW(p) · GW(p)

But in a different situation, where one is asked the different question "Is this syllogism structurally correct?", even when the conclusion flies in the face of one's experience, it is the right hemisphere which gets the answer correct, and the left hemisphere which is distracted by the familiarity of what it already thinks it knows, and gets the answer wrong.

Wait what? Surely the left/right hemispheres are accidentally reversed here? Or is the book saying that the left hemisphere always answers incorrectly, no matter what question you ask?

Replies from: malcolmocean

↑ comment by MalcolmOcean (malcolmocean) · 2019-10-05T21:30:38.411Z · LW(p) · GW(p)

The book is saying that the left hemisphere answers incorrectly, in both cases! As I said, this is surprising.

I haven't looked at the original research and found myself curious what would happen with a syllogism that is both invalid and has a false conclusion. My assumption is that either hemisphere would reject something like this:

Some cows are brown.
Some fish are iridescent.
Some cows are iridescent.

The left hemisphere seems to be where most of motivated cognition lives. If you've heard the bizarre stories about patients confabulating after strokes (eg "my limb isn't paralyzed, I just don't want to move it) this is almost unilaterally associated with damage to the right hemisphere. Many people, following Gazzinga's lead, seem to have assumed this was just because someone with a left hemisphere stroke can't talk, but if you leave words aside, it is apparent that people with left hemisphere damage are distressed about their paralyzed right arm, whereas people with right hemisphere damage are often in denial.

Likewise, part of the job of a well-functioning left hemisphere is to have blindspots. It's so zoomed in on whatever it's focused on that the rest of the world might as well not exist. If you've heard of the term "hemispatial neglect", that leads to people shaving only half of their face, eating only half of their plate, or attempting to copy a drawing of an ordinary clock and ending up drawing something like this:

...then that's again something that only happens when the left hemisphere is operating without the right (again, can also be shown in healthy patients by temporarily deactivating one hemisphere). The left hemisphere has a narrow focus of attention and only on the right side of things, and it doesn't manage to even notice that it has omitted the other half, because as far as it's concerned, the other half isn't there. This is not a vision thing—asked to recall a familiar scene, such a patient may describe only the right half of it.

Replies from: Lanrian

↑ comment by Lukas Finnveden (Lanrian) · 2019-10-06T10:28:15.740Z · LW(p) · GW(p)

The book is saying that the left hemisphere answers incorrectly, in both cases! As I said, this is surprising.

That's not just surprising, that's absurd. I can absolutely believe the claim that the left hemisphere always takes what's written for granted, and solves the syllogism formally. But the claim here is that the left hemisphere pays careful attention to the questions, solves them correctly, and then reverses the answer. Why would it do that? No mechanism is proposed.

I looked at the one paper that's mentioned in the quote (Deglin and Kinsbourne), and they never ask the subjects whether the syllogisms are 'structurally correct'; they only ask about the truth. And their main conclusion is that the left hemisphere always solves syllogisms formally, not that it's always wrong.

If you've heard the bizarre stories about patients confabulating after strokes (eg "my limb isn't paralyzed, I just don't want to move it) this is almost unilaterally associated with damage to the right hemisphere.

Interesting, I didn't know this only happened with the left hemisphere intact.

Replies from: malcolmocean

↑ comment by MalcolmOcean (malcolmocean) · 2019-10-06T23:15:44.150Z · LW(p) · GW(p)

the claim here is that the left hemisphere pays careful attention to the questions, solves them correctly, and then reverses the answer.

Fwiw I also think that that is an absurd claim and I also think that nobody is actually claiming that here. The claim is something more like what has been claimed about System 1, "it takes shortcuts", except in this case it's roughly "to the left hemisphere, truth is coherence; logical coherence is preferred before map coherence, but both are preferred to anything that appears incoherent."

I looked up the source for the "However" section and it's not Deglin and Kinsbourne but Goel and Dolan, 2003). I looked it up and found it hard to read but my sense is that what it's saying is:

An general, people are worse at answering the validity of a logical syllogism when it contradicts their beliefs. (This should surprise nobody.)
Different parts of the brain appear to be recruited depending on whether the content of a syllogism is familiar:

A recent fMRI study (Goel, Buchel, Frith & Dolan, 2000) has provided evidence that syllogistic reasoning is implemented in two distinct brain systems whose engagement is primarily a function of the presence or absence of meaningful content. During content-based syllogistic reasoning (e.g. All apples are red fruit; All red fruit are poisonous;[All apples are poisonous) a left hemisphere frontal and temporal lobe system is recruited. By contrast, in a formally identical reasoning task with arbitrary content (e.g. All A are B; All B are C;[All A are C) a bilateral parietal system is recruited.

(Note: this is them analyzing what part of the brain is recruited when the task is completed successfully.)

This 2003 study investigates whether that's about [concrete vs abstract content] vs [belief-laden vs belief neutral content] and concludes that it's about beliefs, and also < something new about the neuroanatomy >.

I think what's being implied by McGilchrist citing this paper (although it's unclear to me if this was tested as directly as the Deglin & Kinsbourne study) is that without access to the right hemisphere, the left hemisphere's process would be even more biased, or something.

I'd be interested in your take if you read the 2000 or 2003 papers.

↑ comment by Richard_Kennaway · 2019-10-03T17:27:23.874Z · LW(p) · GW(p)

McGilchrist himself has said that it doesn't matter if the neuroscience is all wrong, it makes a good metaphor. See this review, where McGilchrist's "The Master and His Emissary" is quoted:

“If it could eventually be shown…that the two major ways, not just of thinking, but of being in the world, are not related to the two cerebral hemispheres, I would be surprised, but not unhappy. Ultimately what I have tried to point to is that the apparently separate ‘functions’ in each hemisphere fit together intelligently to form in each case a single coherent entity; that there are, not just currents here and there in the history of ideas, but consistent ways of being that persist across the history of the Western world, that are fundamentally opposed, though complementary, in what they reveal to us; and that the hemispheres of the brain can be seen as, at the very least, a metaphor for these…

What [Goethe’s Faust, Schopenhauer, Bergson, Scheler and Kant] all point to is the fundamentally divided nature of mental experience. When one puts that together with the fact that the brain is divided into two relatively independent chunks which just happen broadly to mirror the very dichotomies that are being pointed to – alienation versus engagement, abstraction versus incarnation, the categorical versus the unique, the general versus the particular, the part versus the whole, and so on – it seems like a metaphor that might have some literal truth. But if it turns out to be ‘just’ a metaphor, I will be content. I have a high regard for metaphor. It is how we come to understand the world.”

In which case, why is he peddling it? He is asserting the neuroscience as true. It matters whether it is true, because without it, he's just another purveyor of intellectual artistic ramblings, like the ones he admires. And isn't that dichotomising, in his terms, a left-brain thing to do?

I think that pretty much cuts the ground from under his whole system. It reduces the neuroscience story to a noble lie.

↑ comment by MalcolmOcean (malcolmocean) · 2019-10-04T06:19:39.405Z · LW(p) · GW(p)

Brief reply about dog thing & just naming a part of the brain—I agree!

But saying "system 1" is also not useful unless you have a richer map of how system 1 works. In order for the emotional brain model to be useful, you need affordances for working with it. I got mine from the Bio-Emotive Framework, and since learning that model & technique, I've been more able to work with this stuff, whatever you want to call it, and part of working with it involves a level of identifying what's going on at a level of detail beyond "S1". There are also of course methods of working with this stuff that don't require such a framework!

I'm appreciating you pointing this out, since it represents a way in which my comment was unhelpful—I didn't actually give people these richer models, I just said "there are models out there that I've found much better than S1/S2 for talking about the same stuff". I've pitched the models, but not actually sharing much of why I find them so useful. Although I've just added a long comment elaborating on some hemisphere stuff [LW(p) · GW(p)] so hopefully that helps.

comment by Raemon · 2019-11-12T01:13:50.562Z · LW(p) · GW(p)

Curated.

System 1 and 2 have gotten a lot of attention on LessWrong over the years, informing many explicit models as well as vague folk-theories circulating the community. Given that the originator of the concept has deprecated it, it seems particularly important for the LW community to have common knowledge of the state of the literature.

comment by Steven Byrnes (steve2152) · 2020-12-13T13:40:05.265Z · LW(p) · GW(p)

I found this post super useful for clearing out old bad ideas how thought works and building a better model in its place, which is now thoroughly part of my worldview. I talk about this post and link to it constantly. As one concrete example, in my Can You Get AGI From a Transformer [LW · GW], there's a spot where I link to this post. It doesn't take up a lot of space in the post, but it's carrying a lot of weight behind the scenes.

comment by Hazard · 2020-12-03T23:19:54.917Z · LW(p) · GW(p)

The S1/S2 dichotomy has proven very unhelpful for me.

For some time it served as my "scientific validation" for taking a coercive-authoritarian attitude towards myself, resulting in plenty pain.
It's really easy to conflate S2 with "rational" with "gets correct answers". I know think that "garbage in -> garbage out" applies to S2. You can learn a bunch of explicit procedural thinking patters that are shit getting things right.
In general, S1/S2 encourages conflating "motives" and "cognitive capacities". "S1 is fast and biased and S2 is slow and rational". If you think of slow/fast, intentional/unintentional, biased/rational, you are capable of doing cognition that combines any of these qualities. Unnecessarily grouping them together makes it easier to spin narratives where one "system" is a bad guy that must be overcome, and that's just not how your brain works.

This post (along with the rest of Kaj's amazing sequence) was an crucial nudge away from the S1/S2 frame and towards a way more gearsy model of the mind.

comment by MaxRa · 2019-11-14T10:59:37.129Z · LW(p) · GW(p)

I always understood bias to mean systematic deviations from the correct response (as in the bias-variance decomposition [1], e.g. a bias to be more overconfident, or the bias of being anchored to arbitrary numbers). I read your and Evans' interpretation of it more like bias meaning incorrect in some areas. As Type 2 processing seems to be very flexible and unconstrained, I thought that it might not necessarily be biased but simply sufficiently unconstrained and high variance to cause plenty of errors in many domains.

[1] https://miro.medium.com/max/2567/1*CgIdnlB6JK8orFKPXpc7Rg.png

PS: Thanks for your writing, I really enjoy it a lot.

comment by Alexey Lapitsky (alexey-lapitsky) · 2019-09-30T21:21:22.111Z · LW(p) · GW(p)

Thank you for such a well-structured and concise summary of the research! I really like this sequence.

Pretty interesting to see where all of that could lead from the evolutionary (working memory / type 2 processes in animals) and from the mental disorder perspective.

Replies from: Kaj_Sotala

↑ comment by Kaj_Sotala · 2019-10-03T08:39:31.950Z · LW(p) · GW(p)

Thanks! Glad you like it.

comment by lolobo · 2020-04-13T22:04:57.658Z · LW(p) · GW(p)

I finished "Thinking fast and slow" a few days ago, and I have remarks.

1) You say

The terms “System 1 and System 2” suggest just that: two distinct, clearly defined systems with their own distinctive properties and modes of operation.

Kahneman is aware of that and tries to prevent it. A quote from the conclusion of the book :

And of course you also remember that the two systems do not really exist in the brain or anywhere else. “System 1 does X” is a shortcut for “X occurs automatically.” And “System 2 is mobilized to do Y” is a shortcut for “arousal increases, pupils dilate, attention is focused, and activity Y is performed.”

2) Either there is something I don't understand ir the examples taken by Melnikoff & Bargh (2018) seem to lack charity. In the bat-and-ball problem, the task of "computing the price of the ball by substracting the total cost and the difference between the two prices" is unintentional, there is not a moment when it was decided. This is what is meant by "the task is performed unintentionnaly". Similarly, in the availability heuristic the task "estimate whether more words begin with the letter K than have K in the third position by assessing how many words with each I can remember" is not intentional. The substitution, attributed to System 1, is not intentional.

3) I don't know if the quotes from Evans (2012) are directed toward Kahneman, but "why should [type 2] reasoning necessarily be normatively correct? [...] why should type 1 processes that operate automatically and without reflection necessarily be wrong ?" has nothing to do with what the book says, and you explain quite well why System 1 will make more mistakes. I don't see what this adds to the rest

4) Similarly, there are two chapters of Thinking Fast and Slow (21 and 22) dedicated to the good and bad of expert intuitions, so the remark about "there is much evidence that expert decision making can often be well served by intuitive rather than reflective thinking" seems out of place

I found your text interesting and quite informative but I don't want people to have the idea that Thinking Fast and Slow is letting important things like this slip.

comment by romeostevensit · 2019-09-25T21:26:28.835Z · LW(p) · GW(p)

Relevant: https://en.wikipedia.org/wiki/Global_workspace_theory

comment by SilverFlame · 2023-04-30T00:57:41.714Z · LW(p) · GW(p)

Under this model, then, Type 2 processing is a particular way of chaining together the outputs of various Type 1 subagents using working memory. Some of the processes involved in this chaining are themselves implemented by particular kinds of subagents.

Something I have encountered in my own self-experiments and tinkering is Type 2 processes that chain together other Type 2 processes (and often some Type 1 subagents as well), meshing well with persistent Type 2 subagents that get re-used due to their practicality and sometimes end up resembling Type 1 subagents as their decision process becomes reflexive to repeat.

Have you encountered anything similar?

Replies from: Kaj_Sotala

↑ comment by Kaj_Sotala · 2023-05-01T06:48:22.566Z · LW(p) · GW(p)

Probably, but this description is abstract enough that I have difficulty generating examples. Do you have a more concrete example?

Replies from: SilverFlame

↑ comment by SilverFlame · 2023-05-01T11:54:03.959Z · LW(p) · GW(p)

The most notable example of a Type 2 process that chains other Type 2 processes as well as Type 1 processes is my "path to goal" generator, but as I sit here to analyze it I am surprised to notice that much of what used to be Type 2 processing in its chain has been replaced with fairly solid Type 1 estimators with triggers for when you leave their operating scope. I am noticing that what I thought started as Type 2s that call Type 2s now looks more like Type 2s that set triggers via Type 1s to cause other Type 2s to get a turn on the processor later. It's something of an indirect system, but the intentionality is there.

My visibility into the current intricacies of my pseudo-IFS is currently low due to the energy costs maintaining such visibility produces, and circumstances do not make regaining it feasible for a while. As a result, I find myself having some difficulty identifying any specific processes that are Type 2 that aren't super implementation-specific and vague on the intricacies. I apologize for not having more helpful details on that front.

I have something a bit clearer as an example of what started as Type 2 behavior and transitioned to Type 1 behavior. I noticed at one point that I was calculating gradients in a timeframe that seemed automatic. Later investigation seemed to suggest that I had ended up with a Type 1 estimator that could handle a number of common data forms that I might want gradients of (it seems to resemble Riemann sums), and I have something of a felt sense for whether the form of data I'm looking at will mesh well with the estimator's scope.

Replies from: Kaj_Sotala

↑ comment by Kaj_Sotala · 2023-05-01T15:14:04.795Z · LW(p) · GW(p)

At least Type 2 behavior turning into Type 1 behavior is a pretty common thing in skill learning; the classic example I've heard cited is driving a car, which at first is very effortful and requires a lot of conscious thought, but then gradually things get so automated that you might not even remember most of your drive home. But the same thing can happen with pretty much any skill; at first it's difficult and requires Type 2 processing, until it's familiar enough to become effortless.

comment by MaxRa · 2019-11-14T11:13:16.548Z · LW(p) · GW(p)

I haven't read Stanovichs' papers you refer to, but in his book "Rationality and the reflective mind" he proposes a seperation of Type 2 processing into 1) serial associative cognition with a focal bias and 2) fully decoupled simulations for alternative hypothesis. (Just noting it because I found it useful for my own thinking.)

In fact, an exhaustive simulation of alternative worlds would guarantee correct responding in the [Wason selection] task. Instead [...] subjects accept the rule as given, assume it is true, and simply describe how they would go about verifying it. They reason from a single focal model— systematically generating associations from this focal model but never constructing another model of the situation. This is what I would term serial associative cognition with a focal bias. It is how I would begin to operationalize the satisficing bias in Type 2 processing posited in several papers by Evans (2006b; Evans, Over, & Handley, 2003). "

System 2 as working-memory augmented System 1 reasoning

Contents

What type 1/type 2 processing is not

Type 1/Type 2 processing as working memory use

Type 2 processing as composed of Type 1 components

Looking through Kahneman’s examples

Consciousness and dual process theory

Type 1/Type 2 and bias

Summary and connection to the multiagent models of mind sequence

References

23 comments